This is an exploratory analysis of data-set that reports child maltreatment incidents in Little rock from June 2015 to June 2018. We show here the kernel density maps for the child maltreatment incidents as well as their distribution over time. We also provide some insights into how the different categories of allegations are associated with each other via finding ’frequent itemset’s, and finally, provide correlation structure for showing how child maltreatment relates to a few built environment factor.
First we geocode the addresses using Google’s API and plot the incidents on a map like a point process using the latitude and longitudes extracted by ggmap.
The original data-set has different city names appearing for the reporting addresses. Hence, for this report, we filter the city names by “Little Rock” and use Little Rock location boundaries boundaries to filter out addresses that are outside LR. The map below shows the incidence points as well as their kernel density estimates as a heatmap, after filtering out the points outside Little Rock.
The following figure shows the weekly and yearly child maltreatmnet incidence counts by years over the study period, that goes from June 2015 to June 2018.
This animation shows how the incidents moved over time - the animation shows the points on the map of LR over years from 2015 to 2018. Note that both 2015 and 2018 had fewer events as the data reports only one half of the full year, it does not imply a lower rate of child maltreatment events.
The child maltreatment data that we have has a number of binary variables with different types of allegations. A first step of understanding the pattern of these different categories is to investigate which ones of these categories or sub-types of child maltreatment co-occur and which ones are more exclusive. Towards this, we use a data mining technique called the “frequent Item Set Mining” which is typically used in market basket analysis, click stream analysis, or web link analysis. This method aims at finding regularities in a binary occurence matrix: which is then used to find patterns in the shopping behavior of customers of supermarkets, mail-order companies, on-line shops etc.
More specifically, it is used to find sets of products that are frequently bought together. The patterns that are uncovered are expressed as association rules, for example: If a customer buys bread and wine, then she/he will probably also buy cheese.
Formally, our goal here is to identify “building blocks” for child maltreatment from the binary incidence variable, observed as an \(m \times n\) sparse binary matrix, where \(m,n\) denote the number of observations and number of different allegations/events respectively. The idea here is to think of this data as a transaction data, where each sample represent a transaction and each cause an item. In the language of `itemset mining’, we have an item set \(I\) and a trasaction set \(D\), and each transaction in \(D\) contains a subset of the items in \(I\). In our case, each subject might have 1’s on a subset of the list of columns.
First, we define some preliminaries and terminologies for explaining the results. Please note that, these are terminologies developed by the researchers in data-mining and computer science, so a “rule” does not imply any causal link
An association rules is a rule that surpasses a user-specified minimum support and minimum confidence threshold, and to select the interesting association rules we impose further filter the rules (or rank them) by an additional interest measure - e.g. lift. There are other measures of interest such as Chi-square measure, conviction and leverage but we will skip them for the sake of simplicity.
To see which items are important in the data set we can use the itemFrequencyPlot. To reduce the number of items, we only plot the item frequency for items with a support greater than 5% (using the parameter support).
We find all rules with a minimum support of 1% and a confidence of 0.5. This is done using the `Apriori’ algorithm 1. In order to identify the interesting association rules is to look at association rules with known relationships, and study their association strengths or measures of interest, such as, lift, confidence, support etc. Table below shows the top rules by “support” and also looks at other measures of interest such as the chi-square test of association. To be more specific, from the follwing table, the two sub-categories “Sexual Contact” and “Broad-Allegation-Sexual-Abuse” co-occur on 10% of the total cases, and the corresponding lift measure is 7.23, so these two items co-occur 7 times more often than what to be expected if they were independent. This also reflects in the high value of the chi-square test statistics (3591.31). It appears that the items on the right hand side (Broad allegations) are really broader categories of these rules.
Here we show a few plots to understand these rules better.
Recall that for a rule \(X \Rightarrow Y\), X is called antecedent and Y is called consequent
This shows the rules (or itemsets) as a graph with items as labeled vertices, and rules (or itemsets) represented as vertices connected to items using arrows. For rules, the LHS items are connected with arrows pointing to the vertex representing the rule and the RHS has an arrow pointing to the item.
From this graph, the three broad allegation categories (neglect, abuse and sexual abuse) constitute the core of the graph and they appear to be the main distinguishing categories and the items on the periphery are the sub-categories associated with them.
Here we show the correlation patterns for child maltreatment counts within 500 ft grid cells and a number of built environment factors. The factors considered here are shown below, with their basic summary statistics for the grid-level counts.
cm_lr_500ft
Dimensions: 14207 x 9
Duplicates: 14051
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph |
|---|---|---|---|---|
| 1 | CM_Count [numeric] |
Mean (sd) : 0.3 (1.4) min < med < max: 0 < 0 < 52 IQR (CV) : 0 (5.1) |
29 distinct values | |
| 2 | Tattoo [numeric] |
Min : 0 Mean : 0 Max : 1 |
0 : 14195 (99.9%) 1 : 12 ( 0.1%) |
|
| 3 | MobHomes [numeric] |
Mean (sd) : 0 (0) min < med < max: 0 < 0 < 2 IQR (CV) : 0 (63.1) |
0 : 14203 (100.0%) 1 : 3 ( 0.0%) 2 : 1 ( 0.0%) |
|
| 4 | MajDept [numeric] |
Mean (sd) : 0 (0.1) min < med < max: 0 < 0 < 2 IQR (CV) : 0 (18.6) |
0 : 14163 (99.7%) 1 : 40 ( 0.3%) 2 : 4 ( 0.0%) |
|
| 5 | LiqStore [numeric] |
Mean (sd) : 0 (0.1) min < med < max: 0 < 0 < 4 IQR (CV) : 0 (16.2) |
0 : 14150 (99.6%) 1 : 3 ( 0.0%) 2 : 51 ( 0.4%) 4 : 3 ( 0.0%) |
|
| 6 | Hotel [numeric] |
Mean (sd) : 0 (0.1) min < med < max: 0 < 0 < 4 IQR (CV) : 0 (18) |
0 : 14154 (99.6%) 1 : 43 ( 0.3%) 2 : 8 ( 0.1%) 3 : 1 ( 0.0%) 4 : 1 ( 0.0%) |
|
| 7 | PublicHS [numeric] |
Min : 0 Mean : 0 Max : 1 |
0 : 14202 (100.0%) 1 : 5 ( 0.0%) |
|
| 8 | BarbBeauty [numeric] |
Mean (sd) : 0 (0.4) min < med < max: 0 < 0 < 29 IQR (CV) : 0 (12.7) |
11 distinct values | |
| 9 | Banks [numeric] |
Mean (sd) : 0 (0.4) min < med < max: 0 < 0 < 9 IQR (CV) : 0 (8.4) |
0 : 13896 (97.8%) 1 : 176 ( 1.2%) 2 : 73 ( 0.5%) 3 : 23 ( 0.2%) 4 : 14 ( 0.1%) 5 : 11 ( 0.1%) 6 : 8 ( 0.1%) 7 : 4 ( 0.0%) 8 : 1 ( 0.0%) 9 : 1 ( 0.0%) |
Figure 2.2 shows the correlation matrix.
Next, we look at how child maltreatment counts correlate with a few selected ACS variables at the tract level. The variables shown in the map below are as follows, grouped by the direction of the correlation coefficient. Figure 3.2 shows the heatmap, where the variables are clustered by a hierarchical clustering algorithm where variables strongly correlated are clustered together.
Positive Correlation with child maltreatment counts
1. Population size (NL)
2. Population density (p = .052)
3. % Under 18
4. % Black
5. % Non-White
6. % Hispanic
7. % Non-Married Family Households
8. % Female Headed Households
9. % Single Parent Households
10. % Low Education level (less than high school)
11. % Renter Occupied
12. % Population under 18 in poverty
13. % Population struggling
14. % Not insured
15. % On public insurance
16. % Households with High household cost
Negative Correlation with child maltreatment counts
1. % Population with a College Education
2. % Own their home
The following maps shows the spatial distribution of a few selected variables that are correlated with the child maltreatment counts at the tract level as well as the rate defined as the ratio of count of child maltreatment reports to the total population size (NL).
The Apriori Algorithm [https://en.wikipedia.org/wiki/Apriori_algorithm]. " proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database."↩